Skip to content

Conversation

@hengtaoguo
Copy link
Collaborator

@hengtaoguo hengtaoguo commented Dec 6, 2025

Description

Reduce user friction in SFT/RL and fix broken links.

b/463394566
b/463409639
b/463409807
b/463396352
b/463393644

Tests

N/A

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@hengtaoguo hengtaoguo force-pushed the hengtaoguo-grpo branch 2 times, most recently from 8629e8b to 5ae647f Compare December 10, 2025 06:32
@hengtaoguo hengtaoguo changed the title More UXR fixes Docs: Improve SFT/RL user experience Dec 10, 2025
@hengtaoguo hengtaoguo marked this pull request as ready for review December 10, 2025 18:11

## Create virtual environment and Install MaxText dependencies
If you have already completed the [MaxText installation](https://github.com/AI-Hypercomputer/maxtext/blob/main/docs/guides/install_maxtext.md), you can skip to the next section for post-training dependencies installations. Otherwise, please install `MaxText` using the following commands before proceeding.
If you have already completed the [MaxText installation](../../install_maxtext.md), you can skip to the next section for post-training dependencies installations. Otherwise, please install `MaxText` using the following commands before proceeding.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to change the link here?


export RUN_NAME=<name for this run> # e.g., $(date +%Y-%m-%d-%H-%M-%S)
export MAXTEXT_CKPT_PATH=${BASE_OUTPUT_DIRECTORY}/${RUN_NAME}/0/items
export MAXTEXT_CKPT_PATH=${BASE_OUTPUT_DIRECTORY}/${RUN_NAME}/0/items # Actual checkpoint saved with an extra /0/items path suffix
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right if user has the checkpoint in a GCS. We can remove this env variable from here and move this to next section, similar to https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft.html#get-your-model-checkpoint.

The overview of what this run will do is as follows:

1. We load a policy model and a reference model. Both are copies of `Llama3.1-8b-Instruct`.
1. We load a policy model and a reference model. Both are copies of the model checkpoint you specified (e.g., `Llama3.1-8b-Instruct`).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do the same at line 128?


## 2. Install XPK
Install XPK by following the instructions in the [official documentation](https://github.com/AI-Hypercomputer/xpk?tab=readme-ov-file#installation-via-pip).
Install XPK by following the instructions in the [official documentation](https://github.com/AI-Hypercomputer/xpk?tab=readme-ov-file#installation-via-pip). We also provide a quick guide for XPK installation and usage [here](https://maxtext.readthedocs.io/en/latest/run_maxtext/run_maxtext_via_xpk.html).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We also provide a quick guide for XPK installation here
That XPK documentation mainly talks about pre-training. Pointing users to XPK documentation at this point might create some confusion. Can we explicitly say just follow the instruction in that guide for XPK installation & Prerequisite and continue on the current doc for post-training?

## Submit your RL workload via Pathways

Please create a pathways ready GKE cluster as described [here](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/create-gke-cluster), and you can submit the `train_rl.py` script via [XPK](https://github.com/AI-Hypercomputer/xpk).
Please create a pathways ready GKE cluster as described [here](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/create-gke-cluster), and you can submit the `train_rl.py` script via [XPK](https://github.com/AI-Hypercomputer/xpk). We also provide a quick guide for XPK installation and usage [here](../../run_maxtext/run_maxtext_via_xpk.md).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants